Information Imperfection Processing in Supervised Classification Systems

نویسندگان

ANAS DAHABIAH

JOHN PUENTES

چکیده

Along with possibility theory, fuzzy relation composition rules will be used in our novel approach to deal with the imperfection and the uncertainty that can affect the information elements in any classification system. This takes place at the level of the descriptors of the dataset and the training set objects that can take imprecise, probabilistic, possibilistic, or even missing values, or it happens when assigning classes to the objects associated with different strength degrees. In addition, experts’ ambiguous knowledge of the attributes and the objects under consideration must also be pondered in the classification systems. These three types of imperfection will be handled within a simple unified framework, followed by an illustrative detailed example. Key-Words: Possibility Theory, Fuzzy Relation Composition, Information Imperfection and Uncertainty. 1 Problem Description Information imperfection is one of the most important problems remained unsolved within a unified complete framework in the field of classification and pattern recognition. This thorny issue can mainly be materialized in three essential types. The first one may be encountered at the level of the descriptors of the objects themselves, called the features, the characteristics, or the attributes, that can take any imperfect informational content due to the flood of data on the one hand, or because of the outputs of the other automatic systems which precede this step [1]. For instance, the value of a given attribute can be given as an imprecise value as the age of a patient is between 25 and 30 (quantitative imprecision), or as the pathology is either hernia grade I or hernia grade II (qualitative imprecision). The values of the other values could also be assigned via probability, evidence, or even possibility distributions [2]. It is also possible to find some missing values in the descriptors of the objects that can complicate the process. All these forms of imperfection concerning the object descriptors must be considered when designing any robust classification system. The second type of imperfection encountered in the classification systems is the consequence of the ambiguous knowledge of the experts concerning the resemblance and the tolerance which must be carried out during the processing, i.e., sometimes, experts’ opinion and viewpoints ought to be taken into account when classifying the objects, and this is called “personalization”. This process enables the experts to describe to which extent he or she considers that the values of a given attribute are similar in a fuzzy manner. For example, taking the patient record as an object, some measurements and analysis could take very small values like 0.0021, 0.0022, ..., 0.0028. In this case, the doctor may consider two values of such attribute are similar if their difference doesn’t exceed 0.0001, while for another attribute that takes its values between 6 10 2× and 6 10 6× , perhaps if the difference between two values doesn’t exceed 1000, they are considered similar. Of course, if the difference is null, they are completely similar, and this similarity decreases when the difference increases until it diminishes beyond the value 1000. These two aspects of imperfection (at the levels of object’s descriptors, and expert knowledge) can affect and complicate the measuring of similarity between the objects of the dataset and the training set, which plays a vital role in any classification system. The third type of imperfection can occur when labeling the training patterns in the learning dataset within the framework of the supervised classification [1][3]. In this case, it isn’t evident to be capable to assignee each object to only one class (label, decision, or hypothesis). On the contrary, it may be associated with various labels via different strength degrees (membership degrees). This can usually take place, when the object classes are assigned by means of an automatic system [4]. This issue enables the users to take account of the real state of the objects, avoiding the complexity to find the appropriate tools that permit to only take one decision concerning the object class. The doctor is authorized for example to simultaneously belief in different sickness, pathologies, lesions, or medicines, with different trust degrees, according to the evidence RECENT ADVANCES in ARTIFICIAL INTELLIGENCE, KNOWLEDGE ENGINEERING and DATA BASES ISSN: 1790-5109 215 ISBN: 978-960-474-154-0 that he or she has (gotten via the descriptors in the patient record, the medical images made of the concerned organ, etc.). Unfortunately, there isn’t until now any work that considers these three aspects of imperfection within a unified simple framework, though they may usually be encountered together in the real very large databases that one may handle when achieving different data mining tasks and techniques. 2 Prior Works In spite of its importance stressed in various recent researches, the aforementioned problem has partially been addresses in the literature, i.e. the third type of imperfection has been pondered in the interesting works of classifier designing using an evidential approach [3], but with some conditions and constraints that assume that the sum of all object class membership degrees must be equal to 1 in order to be able to calculate the belief masses, even if it isn’t always easy to satisfy this requirement. In addition, the first two types of imperfection haven’t been considered in these works. Nonetheless, the need to deliberate them in order to compute the similarity and to achieve the different steps of any mining system [5][6], or any artificial intelligence and machine learning process [7] has been pointed out explicitly in the recent researches [8]. Accordingly, we proposed a method essentially based on possibility theory that takes the first two aspects of imperfection [2], then we improved this technique by taking the uncertainty in class assignment into account [1].Nevertheless, all the proposed methods are limited to work with the systems which satisfy the unity class membership sum condition. Along with possibility theory, fuzzy relation composition rules will be used in this paper to ameliorate and to generalize the proposed classification system by taking account of all the types of imperfection in a simple, sophisticated, constraint-free design. In the following, we introduce the necessary mathematical bases for the proposed method presented in section 4, clarified by an illustrative example in section 5 and a brief conclusion accompanied with some perspectives in the last section. 3 Basic Mathematical Background In the following, we will briefly explain two important notions in fuzzy set theory on which our approach is essentially based. The first one that is related to the fuzzy propositions [9] will be very useful to calculate the similarity between objects having imperfect information elements, by taking at the same time the ambiguous knowledge of the expert concerning the resemblance between two values of each descriptor. The second issue that presents the main rule of fuzzy relation composition [10] will be utilized to compose both the possibilitybased and the necessity-based fuzzy relations related to the resemblance between the observed and the training objects, with the fuzzy relation that describes the training object class membership, in order to calculate at the end the possibility and the necessity degrees of the relation between the observed objects and each category in the class set. 3.1 Fuzzy Proposition For a variable given via a 3-tuple information element ) , , ( V T V Ω , where V is the variable name defined on the universe Ω and the set { } ,.... , 2 1 A A TV = of the basic fuzzy characterization of V, “V is A” defined by means of a normalized fuzzy set A of Ω is called an elementary or atomic fuzzy proposition. The compound fuzzy proposition is obtained by combining several atomic fuzzy propositions like “V is A” and “W is B”, etc. The simplest compound fuzzy proposition is a conjunction of elementary fuzzy propositions “V is A and W is B” for two variables V and W respectively defined on the universes 1 Ω and 2 Ω (like for instance, the glycaemia level is abnormal and the cholesterol level is high”). It is associated with the Cartesian product 2 1 Ω × Ω of the fuzzy sets of 1 Ω and 2 Ω , characterizing the pair ) , ( W V on . 2 1 Ω × Ω Its truth value is defined by )) ( ), ( min( ω μ ω μ B A or more generally by )) ( ), ( ( ω μ ω μ B A T for a t-norm T, in any ) , ( 2 1 ω ω of 2 1 Ω × Ω . Such a fuzzy proposition is very common in rules of knowledge-based systems and in fuzzy control. Similarly, we can combine elementary propositions by a disjunction of the form "V is A or W is B". The truth value of the fuzzy proposition is defined by )) ( ), ( max( ω μ ω μ B A or more generally by )) ( ), ( ( ω μ ω μ B A ⊥ for a t-conorm ⊥ , in any ) , ( 2 1 ω ω

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Composite Kernel Optimization in Semi-Supervised Metric

Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...

متن کامل

Using Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council

Supervised clustering is a data mining technique that assigns a set of data to predefined classes by analyzing dataset attributes. It is considered as an important technique for information retrieval, management, and mining in information systems. Since customer satisfaction is the main goal of organizations in modern society, to meet the requirements, 137 call center of Tehran city council is ...

متن کامل

Using Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council

متن کامل

Hyperspectral Image Classification Based on the Fusion of the Features Generated by Sparse Representation Methods, Linear and Non-linear Transformations

The ability of recording the high resolution spectral signature of earth surface would be the most important feature of hyperspectral sensors. On the other hand, classification of hyperspectral imagery is known as one of the methods to extracting information from these remote sensing data sources. Despite the high potential of hyperspectral images in the information content point of view, there...

متن کامل

Application of remote sensing and geographical information system in mapping land cover of the national park

The study was conducted with the objective of mapping landscape cover of Nechsar National park in Ethiopia to produce spatially accurate and timely information on land use and changing pattern. Monitoring provides the planners and decision-makers with required information about the current state of its development and the nature of changes that have occurred. Remote sensing and Geographical Inf...

متن کامل

Object-Oriented Method for Automatic Extraction of Road from High Resolution Satellite Images

As the information carried in a high spatial resolution image is not represented by single pixels but by meaningful image objects, which include the association of multiple pixels and their mutual relations, the object based method has become one of the most commonly used strategies for the processing of high resolution imagery. This processing comprises two fundamental and critical steps towar...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Information Imperfection Processing in Supervised Classification Systems

نویسندگان

چکیده

منابع مشابه

Composite Kernel Optimization in Semi-Supervised Metric

Using Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council

Using Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council

Hyperspectral Image Classification Based on the Fusion of the Features Generated by Sparse Representation Methods, Linear and Non-linear Transformations

Application of remote sensing and geographical information system in mapping land cover of the national park

Object-Oriented Method for Automatic Extraction of Road from High Resolution Satellite Images

عنوان ژورنال:

اشتراک گذاری